Empirical Evaluation of Resampling Procedures for Optimising SVM Hyperparameters
نویسندگان
چکیده
Tuning the regularisation and kernel hyperparameters is a vital step in optimising the generalisation performance of kernel methods, such as the support vector machine (SVM). This is most often performed by minimising a resampling/cross-validation based model selection criterion, however there seems little practical guidance on the most suitable form of resampling. This paper presents the results of an extensive empirical evaluation of resampling procedures for SVM hyperparameter selection, designed to address this gap in the machine learning literature. We tested 15 different resampling procedures on 121 binary classification data sets in order to select the best SVM hyperparameters. We used three very different statistical procedures to analyse the results: the standard multi-classifier/multidata set procedure proposed by Demšar, the confidence intervals on the excess loss of each procedure in relation to 5-fold cross validation, and the Bayes factor analysis proposed by Barber. We conclude that a 2-fold procedure is appropriate to select the hyperparameters of an SVM for data sets for 1000 or more datapoints, while a 3-fold procedure is appropriate for smaller data sets.
منابع مشابه
An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold crossvalidation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of k...
متن کاملEvaluation of simple performance measures for tuning SVM hyperparameters
Choosing optimal hyperparameters for support vector machines is an important step in SVM design. This is usually done by minimizing either an estimate of generalization error or some other related performance measures. In this paper, we empirically study the usefulness of several simple performance measures that are very inexpensive to compute. The results point out which of these performance m...
متن کاملUniversum Learning for Multiclass SVM
We introduce Universum learning [1], [2] for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.
متن کاملEmpirical Likelihood Approach and its Application on Survival Analysis
A number of nonparametric methods exist when studying the population and its parameters in the situation when the distribution is unknown. Some of them such as "resampling bootstrap method" are based on resampling from an initial sample. In this article empirical likelihood approach is introduced as a nonparametric method for more efficient use of auxiliary information to construct...
متن کاملAn Improved Algorithm for SVMs Classification of Imbalanced Data Sets
Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 18 شماره
صفحات -
تاریخ انتشار 2017